Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_node_exporter_metrics: add support for thermal_zone. #7522

Merged
merged 13 commits into from
Dec 20, 2023

Conversation

pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Jun 6, 2023

Summary

This patch adds support for reading temperature values from /sys/calss/thermal_zone. The labels are replicated from the same behaviour as the prometheus node_exporter.

This plugin provides thermal reporting on linux/arm64, especially for raspberry pi 4 and other similar SBCs.

These sensors are already implemented in the prometheus node_exporter_metrics: https://github.com/prometheus/node_exporter/blob/ed1b8e3d88851806627e4f8262ee26232ca56c2c/collector/thermal_zone_linux.go#L31.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@pwhelan pwhelan self-assigned this Jun 6, 2023
@pwhelan pwhelan temporarily deployed to pr June 6, 2023 23:51 — with GitHub Actions Inactive
@pwhelan
Copy link
Contributor Author

pwhelan commented Jun 6, 2023

Here is a run using valgrind:

valgrind ./bin/fluent-bit -i node_exporter_metrics -p me
trics=thermal_zone -o stdout -m '*' -o exit -m '*' -f 1
==515970== Memcheck, a memory error detector
==515970== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==515970== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==515970== Command: ./bin/fluent-bit -i node_exporter_metrics -p metrics=thermal_zone -o stdout -m * -o exit -m * -f 1
==515970== 
Fluent Bit v2.1.5
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/06/06 19:46:51] [ info] [fluent bit] version=2.1.5, commit=3fdd42c6f2, pid=515970
[2023/06/06 19:46:51] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/06/06 19:46:51] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] path.procfs = /proc
[2023/06/06 19:46:51] [ info] [cmetrics] version=0.6.1
[2023/06/06 19:46:51] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] path.sysfs  = /sys
[2023/06/06 19:46:51] [ info] [ctraces ] version=0.3.1
[2023/06/06 19:46:51] [ info] [output:stdout:stdout.0] worker #0 started
[2023/06/06 19:46:51] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] initializing
[2023/06/06 19:46:51] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] storage_strategy='memory' (memory only)
[2023/06/06 19:46:51] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] thread instance initialized
[2023/06/06 19:46:51] [ info] [sp] stream processor started
[2023/06/06 19:46:56] [error] [/home/pwhelan/Projects/personal/fluent-bit/plugins/in_node_exporter_metrics/ne_utils.c:117 errno=61] No data available
2023-06-06T23:46:56.265368405Z node_thermal_zone_temp{zone="0",type="acpitz"} = 16.800000000000001
2023-06-06T23:46:56.265368405Z node_thermal_zone_temp{zone="1",type="acpitz"} = 16.800000000000001
2023-06-06T23:46:56.265368405Z node_thermal_zone_temp{zone="2",type="acpitz"} = 16.800000000000001
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="0",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="1",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="10",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="11",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="12",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="13",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="14",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="15",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="16",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="17",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="18",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="19",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="2",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="20",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="21",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="22",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="23",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="3",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="4",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="5",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="6",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="7",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="8",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_cur_state{name="9",type="Processor"} = 0
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="0",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="1",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="10",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="11",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="12",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="13",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="14",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="15",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="16",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="17",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="18",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="19",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="2",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="20",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="21",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="22",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="23",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="3",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="4",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="5",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="6",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="7",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="8",type="Processor"} = 10
2023-06-06T23:46:56.306184135Z node_cooling_device_max_state{name="9",type="Processor"} = 10
^C[2023/06/06 19:46:59] [engine] caught signal (SIGINT)
[2023/06/06 19:46:59] [ warn] [engine] service will shutdown in max 5 seconds
[2023/06/06 19:47:00] [ info] [engine] service has stopped (0 pending tasks)
[2023/06/06 19:47:00] [ warn] [input:node_exporter_metrics:node_exporter_metrics.0] Unknown metrics: thermal_zone
[2023/06/06 19:47:00] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/06/06 19:47:00] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==515970== 
==515970== HEAP SUMMARY:
==515970==     in use at exit: 0 bytes in 0 blocks
==515970==   total heap usage: 3,802 allocs, 3,802 frees, 2,640,178 bytes allocated
==515970== 
==515970== All heap blocks were freed -- no leaks are possible
==515970== 
==515970== For lists of detected and suppressed errors, rerun with: -s
==515970== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

There are no memory leaks as far as I can ascertain.

@pwhelan pwhelan temporarily deployed to pr June 6, 2023 23:51 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 6, 2023 23:51 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 7, 2023 00:14 — with GitHub Actions Inactive
Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. In my local box, this PR is working as expected.

plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.c Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_thermalzone_linux.h Outdated Show resolved Hide resolved
plugins/in_node_exporter_metrics/ne_utils.c Show resolved Hide resolved
@edsiper
Copy link
Member

edsiper commented Jul 3, 2023

@pwhelan @leonardo-albertovich is this ready to go ?

@edsiper edsiper temporarily deployed to pr July 3, 2023 02:21 — with GitHub Actions Inactive
@edsiper edsiper temporarily deployed to pr July 3, 2023 02:21 — with GitHub Actions Inactive
@edsiper edsiper temporarily deployed to pr July 3, 2023 02:21 — with GitHub Actions Inactive
@edsiper edsiper temporarily deployed to pr July 3, 2023 02:57 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 7, 2023 15:28 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 7, 2023 15:28 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 7, 2023 15:28 — with GitHub Actions Inactive
@pwhelan
Copy link
Contributor Author

pwhelan commented Jul 7, 2023

@pwhelan @leonardo-albertovich is this ready to go ?

I just added the checks for the calls to flb_sds_cat_safe. I kept the variable names the same since it follows the conventions used throughout the ne_utils.c file. I don't understand myself really what the original logic was for join_a and join_b so changing the names to something more sensical is beyond me. @cosmo0920 might have some insight there.

@edsiper As far as I'm concerned it's ready to go.

@pwhelan pwhelan temporarily deployed to pr July 7, 2023 15:54 — with GitHub Actions Inactive
Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not having much insight of join_a and join_b logic but I did the same implementation for https://github.com/fluent/fluent-bit/pull/7522/files#diff-f32396faf19457b78c990f276acfbb6dde174a2ba49f99197ce827d312659641R212-R216.
This could prevent duplicating a sysfs mount point.
I checked the PR for Intel and AMD(Ryzen) platforms and it works well.

@pwhelan
Copy link
Contributor Author

pwhelan commented Aug 7, 2023

@leonardo-albertovich I'd be happy to open a new PR that renames join_a and join_b to something more understandable but I think it falls out of the scope of this PR.

*** edit ***

I already pushed changes to rename both of them as just label and use a single parameter.

@pwhelan pwhelan force-pushed the node-exporter-thermal_zone branch from 3a332c5 to e945bd1 Compare August 7, 2023 20:44
@pwhelan pwhelan temporarily deployed to pr August 7, 2023 20:45 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr August 7, 2023 20:45 — with GitHub Actions Inactive
…oks possibly redundant with comments.

Signed-off-by: Phillip Whelan <[email protected]>
@cosmo0920
Copy link
Contributor

This could be adopted for more pluggable structure. @edsiper Any missing pieces of getting to be merged?

@cosmo0920
Copy link
Contributor

@pwhelan Any chance to register the corresponding document for this PR?

@pwhelan
Copy link
Contributor Author

pwhelan commented Nov 9, 2023

This could be adopted for more pluggable structure. @edsiper Any missing pieces of getting to be merged?

I already adapted to @nokute78's new pluggable structure.

I added a documentation PR: fluent/fluent-bit-docs#1254. There's not much for individual plugins in the documentation for the node_exporter_metrics plugin.

pwhelan added a commit to fluent/fluent-bit-docs that referenced this pull request Nov 9, 2023
@pwhelan pwhelan requested a review from cosmo0920 November 22, 2023 18:45
@edsiper
Copy link
Member

edsiper commented Dec 20, 2023

thanks everyone, merging this now.

note: I will squash the commits since all belongs to the new thermal_zone collector functionality

@edsiper edsiper merged commit 0ab459a into master Dec 20, 2023
45 checks passed
@edsiper edsiper deleted the node-exporter-thermal_zone branch December 20, 2023 21:26
patrick-stephens added a commit to fluent/fluent-bit-docs that referenced this pull request Dec 22, 2023
* in_node_exporter_metrics: add a reference to thermal_zone.

Related to fluent/fluent-bit#7522.

Signed-off-by: Phillip Whelan <[email protected]>

* in_node_exporter_metrics: update interval property for thermal_zone.

Add description for the thermal_zone interval configuration property.

Signed-off-by: Phillip Whelan <[email protected]>

---------

Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Pat <[email protected]>
Co-authored-by: Pat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants